operator fixity and precedence

2022-01-11 · 4 min read

    You come across a Haskell/Idris/LEAN file littered with custom operators (read "delicious operator soup"):

    parseJson = withObject "Foo" $
        \v -> Foo <$> v .:  "name"
                  <*> v .:  "owner"
                  <*> v .:  "deadline"
                  <*> v .:  "status"
                  <*> v .:? "reco"
                  <*> v .:  "benefit"
                  <*> v .:  "cost"
                  <*> v .:  "residual"
    

    How do you parse this without tearing your eyes out?

    If setting fire to your computer and becoming a wheat farmer doesn't sound like a viable career path, then you should start by finding the list of operators for your language. Here are the ones defined in Idris' prelude:

    module Prelude.Ops
    
    -- Numerical operators
    infix 6 ==, /=, <, <=, >, >=
    infixl 8 +, -
    infixl 9 *, /
    
    -- Boolean operators
    infixr 5 &&
    infixr 4 ||
    
    -- List and String operators
    infixr 7 ::, ++
    infixl 7 :<
    
    -- Functor/Applicative/Monad/Algebra operators
    infixl 1 >>=, =<<, >>, >=>, <=<, <&>
    infixr 2 <|>
    infixl 3 <*>, *>, <*
    infixr 4 <$>, $>, <$
    infixl 8 <+>
    
    -- Utility operators
    infixr 9 ., .:
    infixr 0 $
    
    infixl 9 `div`, `mod`
    

    and the documentation for infix(r|l):

    > :doc infix
    Fixity declarations:
    
      Operators can be assigned a priority level and associativity.
      During parsing operators with a higher priority will collect their
      arguments first and the declared associativity will inform how subterms
      are grouped.
    
      For instance the expression `a + b * c * d + e` is parsed as
    
      `(a + ((b * c) * d)) + e` because:
        `(+)` is at level 8 and associates to the left
        `(*)` is at level 9 and associates to the left
    

    To understand how these expressions are parsed, we need to understand the properties of all operators.

    1. Arity (the number of inputs)
    2. Fixity (infix, prefix, postfix)
    3. Associativity (how parentheses are placed)
    4. Precedence (the order of operations between different operators)

    Arity #

    Operator arity refers to the number of inputs of an operator. The most common arities with some examples:

    1. Nullary (zero inputs): less of an operator and more of a constant.
    2. Unary (one input): e.g., boolean negation ! or bitwise-not ~.
    3. Binary (two inputs): e.g., arithmetic +,-,*,/

    Fixity #

    Operator fixity describes the operator's position in the list of arguments. There are three fixities:

    1. Prefix (operator comes before arguments): e.g., typical function application plus 3 4 or forced prefix form (+) 3 4
    2. Infix (operator comes between arguments): e.g., all arithmetic 3 + 4
    3. Postfix (operator comes after arguments): e.g., RPN (Reverse Polish Notation) or concatinative languages 3 4 +

    Associativity #

    Operator associativity determines how the inner-most parentheses are placed when parsing an expression with the same operator (or same precedence).

    For example, if you have the expression a ◯ b ◯ c ◯ d with some operator , there are 5 ways we can parse it:

    1. ((a ◯ b) ◯ c) ◯ d <-- left associative
    2. a ◯ (b ◯ (c ◯ d)) <-- right associative
    3. (a ◯ b) ◯ (c ◯ d)
    4. a ◯ ((b ◯ c) ◯ d)
    5. (a ◯ (b ◯ c)) ◯ d

    In other words, left associative means we place the inner-most parentheses around the left-most pair, while right associative prefers the right-most pair.

    Operators can also be non-associative, which means a ◯ b ◯ c should fail to parse. An example would be relational operators like x <= y <= z, since (x <= y) <= z would fail to type check, as you would be comparing a boolean to a z.

    Precedence #

    Operator precedence allows us to parse expressions containing multiple different operators without ambiguity. Precedence is a number; higher numbers mean higher priority when parsing.

    I like to imagine parsing precedence for an expression in passes. The first pass we parse only operators of precedence 10. The next pass parses precedence 9, and so on. In each pass we ignore all operators below the current precedence level.

    From school, we know that * has a higher precedence than +. In Idris2 you can see * has precedence 9 while + has precedence 8. Consequently, the expression 100 - 5 * 9 + 111 is parsed like

       base expr:    100 - 5 * 9 + 111
    precedence 9:   100 - (5 * 9) + 111
    precedence 8:  (100 - (5 * 9)) + 111